Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new transform): Aggregate transform to reduce metric volume maintaining data #7846

Merged
merged 36 commits into from
Jun 25, 2021

Conversation

ross
Copy link
Contributor

@ross ross commented Jun 11, 2021

The transform combines (add's) Incremental metric events and uses "last one wins" semantics for Aggregates. This is similar to how statsd works, but it does not (yet) support things like continuing to emit 0's for metrics it's previously seen.

  • Testing
  • TODO discussion & address
  • Documentation
  • Anything else that comes up in discussion and review

/cc #676 which requested statsd functionality
/cc #3668 which is functionality that could be built on top of this PR if desired.

@ross ross marked this pull request as ready for review June 13, 2021 21:36
@ross ross requested review from a team, bruceg and pablosichert and removed request for a team June 13, 2021 21:36
ross added 2 commits June 13, 2021 16:39
Signed-off-by: Ross McFarland <[email protected]>
@spencergilbert
Copy link
Contributor

Seems reasonable to me, but I'll leave the TODO's/questions to others

Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking pretty good, but a few things are jumping out at the moment.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer!

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. There's still the open question about event metadata which I'll get an answer to, but other than that, we just need to nail down the performance/polish.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
docs/reference/components/transforms/aggregate.cue Outdated Show resolved Hide resolved
docs/reference/components/transforms/aggregate.cue Outdated Show resolved Hide resolved
docs/reference/components/transforms/aggregate.cue Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
ross added 2 commits June 16, 2021 18:25
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
Copy link
Contributor Author

@ross ross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes done. There's a couple where there's still outstanding questions before I could do anything with them. I think this is pretty shippable as-is.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
where
Self: 'static,
{
let mut me = self;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
metric::MetricKind::Incremental => {
match self.map.get_mut(&series) {
// We already have something, add to it, will update timestamp as well.
Some(existing) => existing.update(data),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now emit!s an internal metric when this happens.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
src/transforms/aggregate.rs Outdated Show resolved Hide resolved
@ross ross requested a review from a team June 19, 2021 00:06
@ross
Copy link
Contributor Author

ross commented Jun 19, 2021

For some of the sinks, it is simply not possible to avoid. In particular, the Prometheus text and remote_write formats can only expose one metric per time series, and so they require some aggregation before that. The prometheus_exporter sink avoids using one of the standard buffers entirely because it handles it in a different way internally.

Yeah it makes sense when there's a requirement/limitation downstream as described here, but still don't think I'd want it to be happening implicitly when not forced to.

I agree this transform is valuable to Vector, and I hope my discussion did not imply otherwise. The ability to use it in a transform pipeline is indeed an advantage.

🆒

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better. Down to a couple of minor tweaks from me.

src/transforms/aggregate.rs Outdated Show resolved Hide resolved
}

fn flush_into(&mut self, output: &mut Vec<Event>) {
if self.map.len() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this condition add any value? If the len is zero, the map drain will be a no-op too. I would also expect clippy to gripe about this should use is_empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume it's super minor if at all. It was more meaningful back when the code was swapping out maps. Will remove.

docs/reference/components/transforms/aggregate.cue Outdated Show resolved Hide resolved
ross and others added 4 commits June 22, 2021 08:49
@jszwedko jszwedko requested review from bruceg and tobz June 23, 2021 18:49
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am good with this except for the question about handling update failures.

Comment on lines 79 to 82
if !existing.0.update(&data) {
emit!(AggregateUpdateFailed);
}
existing.1.merge(metadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other sinks, when the update fails, we drop the older metric instead of the newer one as is being done here. I think that would be the more sensible behavior, but I am open to your thoughts. Suggestion:

Suggested change
if !existing.0.update(&data) {
emit!(AggregateUpdateFailed);
}
existing.1.merge(metadata);
if !existing.0.update(&data) {
emit!(AggregateUpdateFailed);
*existing = (data, metadata);
} else {
existing.1.merge(metadata);
}

Note that the assignment causes an implicit drop of the metadata, which is reasonable given that we are also dropping the data as well.

Copy link
Contributor Author

@ross ross Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Didn't know the precedent and in a vacuum either option seemed equally bad so I went with the simpler one. Will update now and add in a test for the behavior. Also removed the ! and flipped the branches to avoid the non-essential not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @bruceg, it looks like MetricData.update succeeds when the kind changes (glad i tried to write tests for that case) so that code is "add"ing Incremental and Absolute values which doesn't really make sense. I looked into the metrics buffer and I think it's checking for type matches before doing the add. Kind of seems like update should do the add or replace logic based on kind so that all the users of that method don't have to check and handle that case, but for now I'll rework this to check kind first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Change made in the spirit of your request and the edge cases are now tested and behaving as expected/intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a new pattern so it wouldn't have been obvious.

Good catch, yes update doesn't check the metric kind. It looks like the logic this transform wants and the logic for the metrics buffer is different enough that I'm not clear what the proper semantics should be. I'll have to give that more thought.

Copy link
Contributor Author

@ross ross Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing else it could fail the update, do nothing and return false. Assuming something elsewhere isn't expecting/wanting it to add things with non-matching kind. That would let aggregate use it as is. Would have to dig deeper to see what metrics buffer is doing, but I don't think failing the update would hurt it since it would check before sending it and never get there in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking more along the lines of update possibly taking the new data by value instead of reference and returning it back in the case of an error as Result<(), MetricData>, to avoid needing to clone data inside update. On the other hand, the metrics buffer wants to be able to do add incrementals to both kinds, while this transform wants to replace for the case of different kinds, so that's hard to shoehorn into one method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it's not clear to me how adding Absolutes makes sense given they're intended to be the value as-is, e.g. if you're going 55mph at t0 and 57mpg at t1, 112mph isn't a meaningful number. It seems like kind and Counter/Guage kind of overlap semantic wise and can result in internally conflicting states.

Anyway, not really related to this PR so I can let it go 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you can't add absolutes, but you can update an absolute to a new absolute value. You can add an incremental to either kind as well. Agreed it's not related to this PR.

ross added 2 commits June 23, 2021 21:24
Signed-off-by: Ross McFarland <[email protected]>
Signed-off-by: Ross McFarland <[email protected]>
@tobz
Copy link
Contributor

tobz commented Jun 24, 2021

95 comments, 36 commits... hooo boy. :)

Appreciate you sticking with us on this one. 👍🏻

@bruceg bruceg enabled auto-merge (squash) June 24, 2021 18:55
@bruceg bruceg disabled auto-merge June 25, 2021 21:23
@bruceg bruceg enabled auto-merge (squash) June 25, 2021 21:24
@bruceg bruceg changed the title enhancement: Aggregate transform to reduce metric volume maintaining data feat(new transform): Aggregate transform to reduce metric volume maintaining data Jun 25, 2021
@jszwedko jszwedko disabled auto-merge June 25, 2021 21:53
@jszwedko jszwedko enabled auto-merge (squash) June 25, 2021 21:55
@jszwedko jszwedko disabled auto-merge June 25, 2021 21:55
@jszwedko jszwedko merged commit fe84032 into vectordotdev:master Jun 25, 2021
@binarylogic
Copy link
Contributor

Thank you @ross! This is a great contribution. As @tobz, appreciate you sticking with us.


examples: [
{
title: "Aggregate over 15 seconds"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this says 15 seconds but the interval_ms below is 5000.

@ross ross deleted the transforms-aggregate branch June 25, 2021 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants